The following document is an assignment for Poznan University of Technology’s Data Visualization course. The course is conducted by Dariusz Brzeziński during the 4th semester of Artificial Intelligence Bachelor degree.
The assignment is an implementation of the grammar of graphics, intended to create rich visualizations from the data we were provided with. The data consists of two data sets, for both of which we’ve chosen the upcoming visualizations. As it is stated in the assignment description:
For both the following visualizations, we will provide brief descriptions and reasoning behind them.
We have to credit the lecturer, Dariusz Brzeziński, for the interactive tables we have used. They were built using the DT package.
The interactive tables should work as intended in html format of the document, however they will not be visible in pdf format. For that reason, a standard head of the data frames are displayed.
You can access the project’s repository on GitHub - https://github.com/bujowskis/put-DV/tree/main/ass-2
For simplicity reasons, we are going to show two out of 8 data sets. One of them will be a representative of the sets with sentiments included, and the other one with sentiments missing.
## X Symbol Name Volume X1dC. X1dV. Open High Close
## 1 7 ANTM Anthem 927200 1.44 -3.25 461.80 465.03 464.86
## 2 16 CTLT Catalent 800600 3.90 -41.72 128.03 128.26 124.49
## 3 53 SYK Stryker Corporation 1169700 -3.48 -69.48 267.41 268.87 268.42
## 4 52 STE Steris 351600 -1.07 -36.01 243.11 243.76 242.57
## 5 59 VTRS Viatris 11959600 -1.13 -4.98 13.60 14.30 14.21
## 6 41 MCK McKesson Corporation 643400 0.04 -3.25 247.54 248.44 248.10
## Volume.1 Sentiment
## 1 927200 0.68
## 2 800600 0.66
## 3 1169700 0.63
## 4 351600 0.62
## 5 11959600 0.60
## 6 643400 0.56
## X Symbol Name Volume X1dC. X1dV. Open
## 1 5 BBBY Bed Bath & Beyond Inc. 105519200 -5.30 82.21 30.00
## 2 29 F Ford Motor Company 87711400 -0.38 -13.84 16.84
## 3 14 CCL Carnival Corporation & plc 67608300 -2.25 -0.28 17.30
## 4 60 NCLH Norwegian Cruise Line Holdings Ltd. 39182900 -3.77 3.61 17.17
## 5 49 LCID Lucid Group, Inc. 35016600 -4.62 4.76 22.94
## 6 21 DKNG DraftKings Inc. 29355000 3.71 -6.35 20.58
## High Close Sentiment
## 1 30.06 21.71 NA
## 2 16.90 15.97 NA
## 3 17.48 15.53 NA
## 4 17.38 15.38 NA
## 5 24.41 23.17 NA
## 6 20.89 18.05 NA
TODO
# So the idea here is to show the change in Closing price of all the stocks of particular sector in one plot
# for that we decided to use Treemap and color of treemap shows the chnage in close price
library(treemap)
# Here we are going to show one from including Sentiment and without sentiment
energy = read.csv("Dataset/Sectors/energy.csv")
IT = read.csv("Dataset/Sectors/it.csv")
treemap(energy,index=c("Symbol"),vSize = "Volume", vColor = "X1dC.",type="value",border.col = "black",
border.lwds = 1,title = "Energy Sector",title.legend = "Change in Close price in %")treemap(IT,index=c("Symbol"),vSize = "Volume", vColor = "Sentiment",type="value",border.col = "black",
border.lwds = 1,title = "IT Sector",title.legend = "Sentiment Score")## Ticker.1 Ticker.2 Correlation.Value
## 1 GS JPM 0.7955952
## 2 AAPL MSFT 0.7069591
## 3 AXP JPM 0.6833357
## 4 KO PG 0.6553540
## 5 CRM MSFT 0.6464821
## 6 HON MMM 0.6289362
Static visualization choice for the correlations was pretty obvious from the beginning - a heat map correlation matrix. For that reason, there was really no sketch here.
Regarding handling situations in which there is some correlation value missing, it sufficed to use NA value, which would result in a missing tile in the visualization.
However, there was no such situations in this case, and thus this feature cannot be seen.
library(ggplot2)
library(plotly)
# get all unique tickers
ut <- data.frame(tickers=union(cor_data$Ticker.1, cor_data$Ticker.2))
rut <- data.frame(tickers=rev(ut$tickers)) # save a reversed copy for later
# create dataframe of all combinations
df <- expand.grid(ticker1=rut$tickers, ticker2=ut$tickers)
# read the correlation values
df$val <- NA # correlation not specified, cell will be colored black
for (i in 1:nrow(cor_data)) {
# read from the dataset
df$val[length(ut$tickers)*(match(cor_data$Ticker.1[i], ut$tickers) - 1) +
match(cor_data$Ticker.2[i], rut$tickers)] = cor_data$Correlation.Value[i]
# it's bidirectional
df$val[length(ut$tickers)*(match(cor_data$Ticker.2[i], ut$tickers) - 1) +
match(cor_data$Ticker.1[i], rut$tickers)] = cor_data$Correlation.Value[i]
}
j = length(ut$tickers)
for (i in 0:(length(ut$tickers) - 1)) {
# remove upper triangle
for (k in 0:i) {
df$val[j - k] = NA
}
j = j + length(ut$tickers)
}
for (i in 0:(length(ut$tickers) - 1)) {
# correlation = 1 between the same stock
df$val[length(ut$tickers) + i*(length(ut$tickers) - 1)] = 1
}
# text for tooltip
df <- df %>%
mutate(text = paste0(df$ticker1, "\n", df$ticker2, "\n", "Val: ", df$val))
# Heatmap
p = ggplot(df, aes(ticker1, ticker2, fill=val)) +
geom_tile() +
geom_text(aes(label=round(val, 2)),
size=6
) +
#scale_x_discrete(guide=guide_axis(n.dodge=2)) +
theme(axis.title.x=element_blank(), # remove x axis title
axis.title.y=element_blank(), # remove y axis title,
text=element_text(size=20),
axis.text=element_text(size=20),
legend.key.size = unit(2, 'cm'),
legend.key.height = unit(2, 'cm'),
legend.key.width = unit(2, 'cm'),
axis.text.x=element_text(angle=45, hjust=1)
) +
scale_fill_gradient2(low="white", high="blue",
limits=c(c(0, 1)),
na.value="white"
) +
ggtitle("Stocks correlation matrix")
p